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METHOD AND APPARATUS FOR GENERATING A SUMMARY FROM A 
DOCUMENT IMAGE 

CROSS-REFERENCE TO RELATED APPLICATIONS 

Cross-reference is made to U.S. Patent Application Serial No. 
5 09/AAA.AAA, entitled "Method And Apparatus For Processing Documents" 
(Attorney Docket No. D/99632), which is hereby incorporated herein by 
reference. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

10 This invention relates to processing a scanned image of a document (for 

example a paper document) to generate a document summary from the 
scanned image. 

2. Description of Related Art 

There are many occasions in which it would be desirable to compile 
15 automatically a summary of a document. Several approaches for such 
systems have been proposed in the prior art. 

For example, European Patent Application EP 0902379 A2 describes a 
technique in which a user is able to mark certain words or phrases in an 
electronic version of a document (for example ASCII text), which the system 
20 then extracts to compile a document summary. However, such a system 
requires the user to work with an electronic version of the document. 
Furthermore, the document must already exist in the electronic form before 
any words or phrases can be selected by the user. 

Regarding the summarizing of paper documents (or scanned images of 
25 paper documents), reference may be made to the following documents: 

U.S. Patent Nos. 5,638,543 and 5,689,716 describe systems in which 
paper document images are scanned and the images are processed using 
optical character recognition (OCR) to produce a machine-readable version of 
the document. A summary is generated by allocating "scores" to sentences 



depending on critical or thematic words detected in the sentence. The 

summary is generated from the sentences having the best scores. 

U.S. Patent No. 5,848,191 describes a system similar to U.S. Patent No. 

5,689,716 using scores to rank sentences, the score being dependent on the 
5 number of thematic words occurring in a sentence. However, in U.S. Patent 

No. 5,848,191, the summary is generated directly from the scanned image 

without performing OCR. 

U.S. Patent No. 5,491 ,760 describes a system in which significant words, 

phrases and graphics in a document image are recognized using automatic or 
10 interactive morphological image recognition techniques. A document 

summary or an index can be produced based on the identified significant 

portions of the document image. 

"Summarization Of Imaged Documents Without OCR" by Chen and 
Bloomberg, in Computer Vision and Image Understanding, Vol. 70, No. 3, 

15 June 1998, on pages 307-320, describes an elaborate technique based on 
feature extraction and scoring sentences based on the values of a set of 
discrete features. Prior information is used in the form of feature vector values 
obtained from summaries compiled by professional human summary 
compilers. The sentences to be included in the summary are chosen 

20 according to the score of the sentence. 

The above paper based techniques all employ variations of statistical 
scoring to decide (either on the basis of OCR text or on the basis of image 
maps) which features, or sentences, should be extracted for use in the 
complied summary. 

25 SUMMARY OF THE INVENTION 

In contrast to the above techniques, one aspect of the present invention 
is to generate a summary of a captured (e.g., scanned) image of a document 
on the basis of detected handwritten or electronic annotations made to a 
document prior to scanning. 

30 In more detail, the captured image is processed to detect annotations 

made to the document prior to image capture. The detected annotations can 
be used to identify features, or text, for use to summarize that document. 
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Additionally or alternatively, the detected annotations in one document can be 
used to identify features, or text, for use to summarize a different document. 

BRIEF DESCRIPTION OF THE DRAWINGS 



These and other aspects of the invention will become apparent from the 
5 following description read in conjunction with the accompanying drawings 
wherein the same reference numerals have been applied to like parts and in 
which: 

Fig. 1 is a schematic block diagram of a first embodiment for processing 
a paper document to generate a summary of the document; 

10 Fig. 2 is a schematic flow diagram showing the process for generating 

the summary; 

Fig. 3 is a schematic view of an annotated page of a document; 

Fig. 4 is an enlarged schematic view of a portion of Fig. 3 illustrating 
extraction of a sentence; and 

15 Fig. 5 is a schematic diagram illustrating options for displaying the 

summary. 

DETAILED DESCRIPTION 

Referring to Fig. 1, a system 10 is illustrated for generating a summary 
from a paper document 12. The system comprises an optical capture device 
20 14 for capturing a digital image (for example a bitmap image) of each page of 
the paper document 12. The capture device 14 may be in the form of a digital 
camera, or a document scanner. 

The system 10 also includes a processor 16 for processing the captured 
digital image to generate a summary therefrom. The processor is coupled to 
25 one or more operator input devices 18 (for example, a keyboard, or a pointing 
device) and also to one or more output devices 20 for outputting the 
generated summary. The output devices 20 may, for example, include a 
display unit and/or a printer. 

In contrast to the prior art, one of the principles of this embodiment is to 
30 generate the summary on the basis of annotations made by hand to the paper 
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document prior to scanning (or capture) by the optical capture device 14. The 
processor 16 processes the digital image to detect hand annotations 
indicating areas of interest in the paper document. Text or other features 
indicated by the annotations are extracted and used to compile the summary. 
5 The summary therefore reflects the areas of interest identified by the hand 
annotations in the paper document. 

Referring to Fig. 2, the process for creating the summary by the 
processor 16 comprises a first step 30 of identifying in the captured digital 
image, the annotations made by the user. Suitable techniques for identifying 
10 annotations are described, for example, in U.S. Patent Nos. 5,570,435, 
5,748,805 and 5,384,863, the contents of which are incorporated herein by 
reference. These patents disclose techniques for distinguishing regular 
machine printing from handwritten marks and annotations. 

Fig. 3 illustrates the kind of hand annotations which can be identified 
15 typically, which include underlining 32, circling 34, bracketing 36, margin 
bracketing or marking 38, cross-through 40, anchored arrows indicating place 
changes 42, and handwritten notes or insertions 44. 

At step 46 (Fig. 2), interpretation of the annotations is carried out. The 
level of interpretation may vary from one embodiment to another, depending 
20 on the complexity of annotation permitted by the system 10. For example, 
simple word underlining 32 or circling 34 does not need interpretation, as the 
words are identified directly by the annotations. Bracketing 36 and margin 
marking 38 requires only simple interpretation as identifying the entire text 
spanned by the brackets or marking. 

25 Cross-through annotations 40 are preferably interpreted as a negative 

annotation, for excluding the crossed-through text from the summary. This 
may be regarded in one respect as being equivalent to no annotation at all 
(and hence not drawing any focus to the text for inclusion in the summary). 
However, a cross-through annotation 40 also provides a way of excluding one 

30 or more words near a highlighted word from being included as part of the 
contextual text (Fig. 4). 
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Place change arrows 42 and handwritten notes or insertions 44 also 
require interpretation to identify the respective positions identified by the 
annotations. 

At step 48 (Fig. 2), regions of the digital image identified by the 
5 interpreted annotations are extracted for use in the summary. Each region is 
referred to herein as a "feature", and is an image map of the extracted region 
from the digital image. In addition, each feature is tagged with a pointer or 
address indicating the place in the originally scanned image from which it is 
extracted (or copied). 

10 If an annotation identifies only a single word, or a short phrase, then the 

extracted feature for that annotation is preferably expanded to include 
additional contextual information or text for the annotation. Normally, the 
feature will be expanded to include the sentence 50 (Fig. 4) around the 
annotation. Therefore, at step 48, the processor 16 identifies the location of 

15 full stops and other machine printed marks or boundaries indicating the start 
and finish of a sentence. 

Although Figs. 3 and 4 only illustrate annotation of text in a document, 
one or more graphic portions of the document may also be annotated to be 
included in the summary. In such a case, at step 48, an image map 
20 corresponding to the annotated graphic "feature" is extracted. 

At step 52, the summary is compiled from the extracted features. The 
summary may be compiled in the form of image maps of the extracted 
features, or text portions of the features may be OCR processed to generate 
character-codes for the text. Similarly, handwritten notes or insertions may be 
25 OCR processed to generate character-codes, or they may be used as image 
maps. 

During compilation, any further interpretation of the annotations which 
may be required can be carried out. For example, any crossed-through text 
can be deleted (removed) from the summary (for example, the crossed 
30 through text 40 in Fig. 4). 

Additionally, during compilation, identically annotated features may be 
itemized, for example, with bullets. For example, sentences containing circled 
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words may be organized together as a bulleted list. Such an operation is 
preferably a user controllable option, but this can provide a powerful 
technique enabling a user to group items of information together in the 
summary simply by using the same annotation for marking the information in 
5 the original document. 

Additionally, during compilation, parts of the summary may be 
highlighted as important, based on the annotations made by hand. For 
example, annotations such as an exclamation mark (54 in Fig. 3) or double 
underlining may be included in the summary as importance marking, for 
10 example, by bold or underlined text, or text in a different color. 

At step 56, the compiled summary is outputted, for example, on the 
user's display or printer. 

In this embodiment, the system 10 provides a plurality of layered detail 
levels in a window 57 for the summary, indicated in Fig. 5. These layers may 
15 be applied either during compilation, or during outputting of the summary 
information. 

The lowest detail level 58 merely includes any subject headings 
extracted from the document. 

By clicking on any subject heading, the subject heading is expanded to 
20 its second detail level 60 to generate the text summary of that appropriate 
section of the document. The second detail level 60 only includes text 
features. However, by clicking again, the summary is expanded (third detail 
level 62) to include non-text features as part of the summary, such as 
annotated figures from that section of the document. 

25 By clicking on any sentence, the summary is expanded (fourth detail 

level 64) to display further context for the sentence, for example, by displaying 
the paragraph containing the sentence. 

In a final layer (fifth detail level 66), the annotation associated with any 
sentence in the document may be "retrieved" by clicking on the sentence. 

30 In an alternate embodiment the plurality of layered detail levels for the 

summary may be accessed simply by clicking on each level of detail set forth 
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in the window 57 shown in Fig. 5. That is window 57 may be used to both 
indicate a current level of detail being used to summarize a document as well 
as access a particular level of detail. 

In the present embodiment, the summary is based on annotations made 
5 to the document to be summarized. However, in other embodiments, the 
summary may be made based on annotations made to a different document, 
for example, a previously annotated document or a master document. In such 
an embodiment, a first document is annotated by hand, and the annotations 
are detected and stored by the system 10. A second document is then 

10 captured by the system, and the second document is processed based on the 
annotations detected from the first document. In other words, the annotations 
detected in the first document are used as a guide for generation of the 
abstract of the second document (in the same manner as if the hand 
annotations had been made to the second document). Further information 

15 about this technique is described in U.S. Patent Application Serial No. 
AAA.AAA (Attorney Docket No. D/99632) entitled "Method And Apparatus For 
Forward Annotating Documents", which is hereby incorporated herein by 
reference. 

The invention has been described with reference to a particular 
20 embodiment. Modifications and alterations will occur to others upon reading 
and understanding this specification taken together with the drawings. The 
embodiments are but examples, and various alternatives, modifications, 
variations or improvements may be made by those skilled in the art from this 
teaching which are intended to be encompassed by the following claims. 

25 
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